1. Technical Field
The present disclosure relates to improving the quality of a speech unit selection database and more specifically to modifying parts of the speech unit selection database, then adding the modified speech units back into the database for use in future speech generation.
2. Introduction
Speech unit selection synthesis can generate very natural audio output but can not be relied upon to produce consistently good audio output. For example, the quality of the speech produced depends highly on the size and quality of the database of speech samples being used. To improve quality, speech selection synthesis can use domain-specific databases of speech samples, such that in-domain text for a domain-specific database produces high-quality speech, but resulting in out-of-domain text producing poor quality speech. Previous techniques tend to focus on the segmental level, or repurposing data from other voices/databases to boost the effective size of a database.